The embedding model is trained using 1 billion words from web-crawled corpora for each language (from the WAC corpora: Baroni, et al., 2009; and Aranea corpora: Benko, 2014) using skip-grams with 500 dimensions. These embeddings are segmented into categorical domains using k-means clustering (k = 100). The idea behind these three types of representation is that a particular slot in a construction can be defined or constrained at the lexical, syntactic, or semantic level. These representations thus form the basic alphabet of the algorithm.